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^^ ' Abstract. The relationship between two important problems in tree pattern matching, the 

f"~>. I largest common subtree and the smallest common supertree problems, is established by means 

O^ . of simple constructions, which allow one to obtain a largest common subtree of two trees from 

a smallest common supertree of them, and vice versa. These constructions are the same for 
, -. , isomorphic, homeomorphic, topological, and minor embeddings, they take only time linear in 

^•^ ■ the size of the trees, and they turn out to have a clear algebraic meaning. 

i i' 1 Introduction 



^ ■ Subtree isomorphism and the related largest common subtree and smallest common su- 

OO ! pertree problems have practical applications in combinatorial pattern matching [14,19,28], 

pattern recognition [7,10,25], computational molecular biology [2,20,30], chemical structure 
■r-l- ■ search [3,4,11], and other areas of engineering and life sciences. In these areas, they are 

^D , some of the most widely used techniques for comparing tree-structured data. 

Largest common subtree is the problem of finding a largest tree that can be embedded 
in two given trees, while sm^allest common supertree is the dual problem of finding a smallest 
O . tree into which two given trees can be embedded. A tree S can be embedded in another 

tree T when there exists an injective mapping / from the nodes of S to the nodes of T 
that transforms arcs into paths in some specific way. The type of embedding depends on 
the properties of the mapping /. In this paper we consider the following four types of tree 
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5^ \ embeddings, defined by suitable extra conditions on /: 

Isomorphic embedding: if there is an arc from a to 6 in S, then there is an arc from /(a) 

to /(5) in T. 
Homeomorphic embedding: if there is an arc from a to 6 in S*, then there is a path from 

/(a) to f[b) in T with all intermediate nodes of total degree 2 and no intermediate node 

belonging to the image of /. 
Topological embedding: if there is an arc from a to 6 in S*, then there is a path from /(a) to 

/(6) in T with no intermediate node belonging to the image of /; and if there are arcs 

from a to two distinct nodes b and c in S", then the paths from /(a) to /(6) and to /(c) 

in T have no common node other than f{a). 



Minor embedding: if there is an arc from a to 6 in S, then there is a path from f{a) to f{b) 
in T with no intermediate node belonging to the image of /. 

The different subtree embedding problems of deciding whether a given tree can be em- 
bedded into another given tree, for the different types of embedding defined above, have 
been thoroughly studied in the literature. Their complexity is already settled: they are 
polynomial-time solvable for isomorphic, homeomorphic, and topological embeddings, and 
NP-complete for minor embeddings [8,16,17,18]. Efficient algorithms are known for subtree 
isomorphism [21,26], for subtree homeomorphism [5,27,28], for largest common subtree un- 
der isomorphic embeddings [26] and homeomorphic embeddings [19], and for both largest 
common subtree and smallest common supertree under isomorphic and topological em- 
beddings [12]. The only (exponential) algorithm known for largest common subtree under 
minor embeddings is given in [22]. 

Particular cases of these embedding problems for trees have also been thoroughly stud- 
ied in the literature. On ordered trees, they become polynomial-time solvable for isomor- 
phic, homeomorphic, topological, and also minor embeddings. In this particular case, the 
largest common subtree problem under homeomorphic embeddings is known as the maxi- 
mum agreement subtree problem [1,6,24], the largest common subtree problem under minor 
embeddings is known as the tree edit problem [9,23,31], and the smallest common supertree 
problem under minor embeddings is known as the tree alignment problem [13,15,29]. The 
smallest common supertree problem under minor embeddings was also studied in [18] for 
trees of bounded degree. 

In this paper, we establish in a unified way the relationship between the largest com- 
mon subtree and the smallest common supertree problems for isomorphic, homeomorphic, 
topological, and minor embeddings. A similar correspondence between largest common sub- 
graphs and smallest common supergraphs under isomorphic embeddings was studied in [10]. 
More specifically, we give a simple and unique construction that allows one to obtain in all 
four cases a largest common subtree of two trees from any smallest common supertree of 
them, and vice versa, another simple and unique construction that allows one to obtain in 
all four cases a smallest common supertree of two trees from any largest common subtree of 
them. These constructions take only time linear in the size of the trees, and, moreover, they 
have a clear algebraic meaning: in all four types of embeddings, a largest common subtree of 
two trees is obtained as the pullback of their embeddings into a smallest common supertree, 
and a smallest common supertree of two trees is obtained as the pushout of the embed- 
dings of a largest common subtree into them. This is, to the best of our knowledge, the 
first unified construction showing the relation between largest common subtrees and small- 
est common supertrees for isomorphic, homeomorphic, topological, and minor embeddings. 
These results answer the open problem of establishing the relationship between the largest 
common subtree and the smallest common supertree under any embedding relation, posed 
by the last author in his talk "Subgraph Isomorphism and Related Problems for Restricted 
Graph Classes" at Dagstuhl Seminar 04221, "Robust and Approximative Algorithms on 
Particular Graph Classes," May 23-28, 2004. 



Roughly speaking, our constructions work as follows. Given two trees Ti and T2 and 
a largest common subtree T^ explicitly embedded into them, a smallest common supertree 
of Ti and T2 is obtained by first making the disjoint sum of Ti and T2, then merging in 
this sum each two nodes of Ti and T2 that are related to the same node of T^, and finally 
removing all parallel arcs and all arcs subsumed by paths. Conversely, given two trees Ti 
and T2 embedded into a smallest common supertree T of them, a largest common subtree 
of Ti and T2 is obtained by removing all nodes in T not coming from both Ti and T2, 
and then replacing by arcs all paths between pairs of remaining nodes that do not contain 
other remaining nodes. Unfortunately, the justification for these simple constructions, as 
well as the proof of their algebraic meaning, is rather intricate, and at some points it differs 
substantially for the different notions of embedding. 

Beyond their theoretical interest, these constructions provide an efficient solution of 
the smallest common supertree problem under homeomorphic embeddings, for which no 
algorithm was known until now. The solution extends the largest common homeomorphic 
subtree algorithm of [19], which in turn extended the subtree homeomorphism algorithm 
of [27,28]. Likewise, these constructions also provide a solution to the smallest common 
supertree problem under minor embeddings, for which no algorithm was known previously, 
either. The solution extends the unordered tree edit algorithm of [22]. 

2 Preliminaries 

In this section we recall the categorical notions of pushouts and pullbacks, as they are needed 
in the following sections, and the notions of isomorphic, homeomorphic, topological, and 
minor embeddings of trees, together with some results about them that will be used in the 
rest of the paper. 



2.1 Pushouts and pullbacks 

A category is a structure consisting of: a class of objects; for every pair of objects A,B, a 
class Mor{A,B) of morphisms; and, for every objects A,B,C, a binary operation 

o : Mor{A, B) x Mor(5, C) -^ Mor(^, C) 
{f,9)^ 9° f 

called composition, which satisfies the following two properties: 

Associativity: for every / G Mor(^, i?), g € Mor(i?,C), and h € Mor{C,D), h o [g o f) = 

{hog)of£MoT{A,D). 
Existence of identities: for every object A, there exists an identity niorphism Idyi G Mor(^, 

A) such that Id^ ° f = f, for every / E Mor{B,A), and g o Idyi = g, for every g G 

Moi{A,B). 



It is usual to indicate that / G Mor(^, B) by writing f : A ^ B. 

All categories considered in this paper have all trees as objects and different types of 
embeddings of trees as morphisms: see the next subsection. 

A pushout in a category C of two morphisms fi:A^Bi and f2 : A ^> B2 is an object 
P together with two morphisms gi : Bi ^ P and §2 ■ B2 ^> P satisfying the following two 
conditions: 

(i) 91° fi= 92° h- 
(ii) ( Universal property) If X is any object together with a pair of morphisms g[ : Bi ^> X 

and g'2 : B2 ^ X such that g'l o fi = §2 ° f2, then there exists a unique morphism 

h : P ^> X such that ho gi = g[ and ho g2 = g'2- 

A puUback in a category C of two morphisms fi:Ai^>B and f2'-A2^Bis an object 
Q together with two morphisms gi : Q ^ Ai and g2 '■ Q ^^ A2 satisfying the following two 
conditions: 

(i) fi°9i = f2°92- 
(ii) (Universal property) If X is any object together with a pair of morphisms g[ : X ^ Ai 

and g'2 : X ^> A2 such that fiog[ = /2 o (72; then there exists a unique morphism 

h : X ^ Q such that g'l = gi o h and g2 = 92 ° h. 

Two pushouts in C of the same pair of morphisms, as well as two pullbacks in C of the 
same pair of morphisms, are always isomorphic in C. 

2.2 Embeddings of trees 

A directed graph is a structure G = (V, E) consisting of a set V , whose elements are called 
nodes, and a set E of ordered pairs (a, 6) G y x y with a 7^ 6; the elements of E are called 
arcs. For every arc (f , w) € E, v is its source node and tf its target node. A graph is /infie 
if its set of nodes is finite. The in-degree of a node f in a finite graph is the number of arcs 
that have v as target node and its out-degree is the number of arcs that have v as source 
node. 

An isomorphism f : G ^ G' between graphs G = (V, E) and G' = (V , E') is a bijective 
mapping f -.V —>V' such that, for every a, 6 € y, (a, h) ^ E \i and only if {f{a), f{b)) G E'. 

A path in a directed graph G = (V, E) is a sequence of nodes (vq, wi, • • • , f^) such that 
{vq^vi), (vi, 1^2)5 (""2, ^3)5 • • • 5 (^fc-i, ^fc) G -E; its origin is uq, its end is ffc, and its intermediate 
nodes are ui, . . . , i^fc-i- Such a path is non-trivial if A; ^ 1. We shall represent a path from 
a to b, that is, a path with origin a and end 5, by a-^b. 

A (rooted) tree is a directed finite graph T = (V, E) with V either empty or containing 
a distinguished node r (^ V, called the root, such that for every other node v G V there 
exists one, and only one, path r~^v. Note that every node in a tree has in-degree 1, except 
the root that has in-degree 0. Henceforth, and unless otherwise stated, given a tree T we 
shall denote its set of nodes by V(T) and its set of arcs by E(T). The size of a tree T is its 
number \E(T)\ of arcs. 



The children of a node f in a tree T are those nodes w such that (v, w) £ E(T): in this 
case we also say that v is the parent of its children. The only node without parent is the 
root, and the nodes without children are the leaves of the tree. 

A path (vq, wi, . . . , Vk) in a tree T is elementary if, for every i = 1, . . . ,k — l, Vj+i is the 
only child of Vi] in other words, if all its intermediate nodes have out-degree 1. In particular, 
an arc forms an elementary path. 

Two non-trivial paths {a,vi, . . . , Vk) and (a, tt;i, . . . , wn) in a tree T are said to diverge 
if their origin a is their only common node. Note that, by the uniqueness of paths in trees, 
this condition is equivalent to vi i^ w\. The definition of trees also implies that, for every 
two nodes 6, c of a tree that are not connected by a path, there exists one, and only one, 
node a such that there exist divergent paths a-^b and a^^ c: we shall call this node the 
least common ancestor of b and c. The adjective "least" refers to the obvious fact that if 
there exist paths from a node x to 6 and to c, then these paths consist of a path from x to 
the least common ancestor of b and c followed by the divergent paths from this node to b 
and c. 

Definition 1. Let S and T be trees. 

(i) S is a minor of T if there exists an infective mapping f : V{S) -^ ViT) satisfying the 
following condition: for every a,b £ V{S), if (a, 6) G E{S), then there exists a path 
/(a)~^/(6) in T with no intermediate node in f{V{S)). In this case, the mapping f is 
said to be a minor embedding f : S ^ T. 

(a) S is a topological subtree of T if there exists a minor embedding f : S ^> T such that, 
for every {a,b),{a,c) € E{S) with b ^ c, the paths f{a) -^ f{b) and f{a) -^ f{c) in T 
diverge. In this case, f is called a topological embedding f : S ^ T. 
(Hi) S is a homeomorphic subtree ofT if there exists a minor embedding f : S ^ T satisfying 
the following extra condition: for every (a, 6) G E{S), the path f{a) -^ f{b) in T is 
elementary. In this case, f is said to be a homeomorphic embedding f : S ^> T. 

(iv) S is an isomorphic subtree of T if there exists an injective mapping f : V{S) —i- V{T) 
satisfying the following condition: if (a, 6) G E{S), then (/(a),/(6)) G E(T). Such a 
mapping f is called an isomorphic embedding f : S ^ T. 

Lemma 1. Every isomorphic embedding is a homeomorphic embedding, every homeomor- 
phic embedding is a topological embedding, and every topological embedding is a minor em- 
bedding. 

Proof. It is obvious from the definitions that every isomorphic embedding is a homeomor- 
phic embedding and that every topological embedding is a minor embedding. Now, let 
/ : S" ^ T be a homeomorphic embedding and let (a, b), (a, c) G E{S) be such that b ^ c. 
Then, the paths /(a) -^ f{b) and /(a) -^ /(c) are elementary and they do not contain any 
intermediate node in f{V{S)). This implies that neither f{b) is intermediate in the path 
f{a)-^ f{c), nor /(c) is intermediate in the path f{a)~^ f{b). Therefore, f{b) and /(c) are 
not connected by a path. But then the least common ancestor x of f{b) and /(c) must have 



out-degree at least 2, and thus it cannot be intermediate in the paths from f{a) to these 
nodes. Since there exists a path /(a) -^ x, we conclude that /(a) = x, that is, the paths 
f{a)'^f{b) and f{a)-^ f{c) diverge. This shows that / is a topological embedding. D 

The implications in the last lemma are strict, as the following example shows. 





Fig. 1. The trees S and T in Example 1 



Example 1. Let S and T be the trees described in Fig. 1, with roots r and 1, respectively. 

(a) The mapping /q : V{S) —> V{T) defined by /o(r) = 1, f^ix) = 3 and /o(y) = 4 is not a 
minor embedding, because, although it transforms arcs in S into paths in T, the path 
/o(^)'^/o(y) contains the node 3 = /o(a;), which belongs to /o(y(S')). 

(b) The mapping /i : ^(5") —> y{T) defined by /i(r) = 1, /i(x) = 5 and fi{y) = 6 is a 
minor embedding, because the arcs (r, x), (r, y) G E{S) become paths fi{r)-^^fi{x) and 
/i(r) ~^ fi{y) in T with no intermediate node in fi{y{S)). But it is not a topological 
embedding, because these paths do not diverge. 

(c) The mapping /2 : V{S) -^ y{T) defined by /2(r) = 1, f2{x) = 2 and /2(y) = 6 
is a topological embedding, because the arcs {r,x),{r,y) £ E(S) become divergent 
paths f2{r) "^ 72(3:^) and f2ir) -^ /2(y) in T without intermediate nodes in /2(^(«S')). 
But it is not a homeomorphic embedding, because the path /2(r) ~^/2(y) contains an 
intermediate node with more than one child. 

(d) The mapping /s : V{S) -^ V(T) defined by fs^r) = 1, fsix) = 2 and /sd/) = 4 is 
a homeomorphic embedding, because the arcs {r,x),{r,y) G E{S) become elementary 
paths f-i{r)-^f^{x) and fz{r)-^ f^iy) in T with no intermediate node in f^(y{S)). But 
it is not an isomorphic embedding, because the path f^{r)'~^fz{y) is not an arc. 

(e) The mappings /4 : V{S) -^ ViT) defined by fi{r) = 1, fi{x) = 2 and fi{y) = 3, 
and /s : V{S) -^ ViT) defined by f^{r) = 4, f^{x) = 5 and f^{y) = 6 are isomorphic 
embeddings, because they transform every arc in S into an arc in T. 

The following lemmas will be used several times in the next sections. 



Lemma 2. Let f : S ^ T be a minor embedding. For every a,b (^ ^{S), there exists a 
path a-^b in S if and only if there exists a path f{a) ^-^ f{b) in T . Moreover, if the path 
f{a) -^ f{b) is elementary, then the path a-^b is also elementary, and if there is an arc 
from f{a) to f(b) in T, then there is an arc from a to b in S. 

Proof. Since the arcs in S become paths in T without intermediate nodes in f{V{S)), it is 
obvious that a path a-^bm S becomes, under /, a path /(a) -^ f{b) in T whose intermediate 
nodes belonging to f(y{S)) are exactly the images under / of the intermediate nodes of 
the path a-^b. 

Assume now that there exists a path f{a) -w /(6) in T, and let r be the root of S. If 
a = r or a = 6, it is clear that there exists a path a -^ 6 in S*. If a 7^ r and a ^ b, then 
the images of the paths r -^a and r ~-^ 6 in S are paths /(r) ^-> f{a) and /(r) ^-> f{b) in T. 
Now, the uniqueness of paths in T implies that the path /(r) ■-^ f{b) splits into the path 
f{r)'^f{a) and the path f{a)-^f{b). Therefore, /(a) is an intermediate node of the path 
f{r)^^f(b). As a consequence, since / is injective and any intermediate node of this path 
belonging to f(y{S)) must be the image under / of an intermediate node of the path r-^b, 
the node a must be intermediate in the path r-^b, which yields a path a -^6 in S. 

Moreover, if a node in S has more than one child, then its image under / has also more 
than one child. This implies that if the path f{a)^^f(b) is elementary, then the path a~^6 
is elementary, too. Finally, if there is an arc from f{a) to f{b), then the path a^^b cannot 
have any intermediate node: it must be an arc. D 

By Lemma 1, the last lemma applies also to isomorphic, homeomorphic, and topological 
embeddings. 

Lemma 3. Let f : S ^ T be a topological embedding. For every a,b & ^(5*) not connected 
by a path, if x is their least common ancestor in S, then f{x) is the least common ancestor 
of f{a) and f{b) inT. 

Proof. Since a and b are not connected by a path in S, by the last lemma we know that 
/(a) and f{b) are not connected by a path in T, either. Let now x be the least common 
ancestor of a and b in S, and let v and w be the children of x contained in the divergent 
paths x-^a and x^^b, respectively. Then, since / is a topological embedding, there exist in 
T divergent paths f{x)-^f{v) and f{x)'^f{w), which are followed by paths f{y)-^ f{a) 
and f{w) -^ f{b), respectively. This means that f{x) is the node in T from which there 
exist divergent paths to /(a) and to /(6), that is, the least common ancestor of these two 
nodes. D 

By Lemma 1, the last lemma applies also to isomorphic and homeomorphic embeddings. 
But the thesis of this lemma need not hold if / is only a minor embedding: see, for instance 
Example l.(b), where r is the least common ancestor of x and y, but the least common 
ancestor of fi{x) = 5 and /i(y) = 6 is 4, and not /i(r) = 1. 

Lemma 4. Every bijective minor embedding is an isomorphism of graphs. 



Proof. Let f : S ^> T he a minor embedding such that / : V{S) -^ V(T) is bijective, 
and let a,b & ^(S)- If (a, &) £ E{S), then there exists a path /(a) -^ /(5) in T without 
any intermediate node in f{V{S)). Since / is bijective, this means that this path has no 
intermediate node, and thus it is an arc. This proves that if (a, b) € E{S), then (/(a), /(6)) € 
E{T). The converse imphcation is given by Lemma 2. D 

By Lemma 1, the last lemma implies that every bijective isomorphic, homeomorphic, 
or topological embedding is an isomorphism of graphs. 

Definition 2. Let S and T be trees. 

(i) A largest common isomorphic subtree (homeomorphic subtree, topological subtree, mi- 
nor) of S and T is a tree that is an isomorphic subtree (respectively, homeomorphic 
subtree, topological subtree, minor) of both of them and has the largest size among all 
trees with this property. 

(a) A smallest common isomorphic supertree (homeomorphic supertree, topological su- 
pertree, supertree under minor embeddings) of S and T is a tree such that both S 
and T are isomorphic subtrees (respectively, homeomorphic subtrees, topological sub- 
trees, minors) of it and has the least size among all trees with this property. 

We shall denote by Treejso, Tree/iom, Treejop, and Treemm the categories with objects all 
trees and with morphisms the isomorphic, homeomorphic, topological, and minor embed- 
dings, respectively. Whenever we denote generically any one of these categories by Tree*, 
we shall use the following notations. By a Tree^^-embedding we shall mean a morphism in 
the corresponding category. By a common Tree* -swMree of two trees we shall mean a tree 
together with Tree*-embeddings into these two trees. By a largest common Tree^^-subtree of 
two trees we shall mean a largest size common Tree*-tree. By a common Tree^,- supertree 
of two trees we shall mean a tree together with Tree*-embeddings of these two trees into 
it. By a smallest common Tree^- supertree of two trees we shall mean a least size common 
Tree*-supertree. And by a Tree^-path we shall understand an arc if Tree* stands for Treejso, 
an elementary path if Tree* denotes Treehom, and an arbitrary path if Tree* means Treetop 
or Treemin- Note in particular that all trivial paths and all arcs are Tree*-paths, for every 
category Tree*. 

The following corollary is a simple rewriting of the definitions. 

Corollary 1. Let Tree* denote any category Treej^o, Jreehom, or TreCmm- For every trees 
S,T, a mapping f : V{S) -^ ViT) is a Tree^,- embedding if and only if, for every (a, 6) G 
E{S), there is a Tree*-pai/i /(a) ~^ f{b) in T with no intermediate node belonging to 

f{V{S)). 

And the following corollary is a direct consequence of Lemma 2. 

Corollary 2. Let Tree* be any category Treejso; Tree/jom; Treetop, or Treemin, cind let f : 
S ^> T be a Tree^- embedding. For every a,b £ V{S), if there exists a Tree* -pai/i f{a)-^f{b) 
in T, then there exists a Tree^^ -path a-^b in S. 



Finally, we have the following result, which will be used later. 

Lemma 5. Let Tree^, be any category Treejso, Treehomj Treetop, or Treemin, let S, T, U be 
trees and f : V{S) -^ ViT) and g : V{T) -^ y{U) mappings between their sets of nodes. 
If g o f : S ^ U and g : T ^ U are Tree^ -enibeddings, then f : S ^> T is also a Tree*- 
embedding. 

Proof. Since g o f is injective, it is clear that / is injective. Let now a,b £ S he such 
that (a, 6) € E{S). Since g o f : S —^ U is a Tree*-embedding, there exists a Tree^,-path 
g{f{a))-^g{f{b)) in U without any intermediate node in g{f{V{S))). Since g : T —> U is a 
Tree*-embedding, the existence of this path g{f {a)) -^ g{f {b)) in U implies, by Corollary 2, 
the existence of a Tree^,-path f{a)-^f(b) in T. This path cannot have any intermediate node 
in f{V{S)), because any such intermediate node would become, under g, an intermediate 
node belonging to g{f{V{S))) in the path g{f{a))'-^g{f{b)). 

So, / is injective and if {a,b) € E{S), then there exists a Tree*-path /(a) -^ f(b) in 
T without intermediate nodes in f(y{S)). This already shows, by Corollary 1, that / is a 
Tree*-embedding when Tree^, stands for Treejso, Tree/iomj or Treemm- 

As far as Treetop goes, we have already proved that / transforms arcs into paths without 
intermediate nodes in f(y{S)), and thus it remains to prove that if a,b,c a V{S) are such 
that (a, 6), (a, c) G E{S) and b j^ c, then the paths f{a)-^f{b) and f{a)-^f{c) in T diverge. 
But since gof is a topological embedding, the paths g{f {a)) -^ g{f {b)) and g{f {a)) -^ g{f {c)) 
in U are divergent, and this clearly implies that the paths fia)-^ f(b) and fio,)-^f{c) in 
T are divergent, too: any common intermediate node in these paths would become, under 
g, a common intermediate node in the paths g{f {a)) -^^ g{f {b)) and g{f{a))^^g{f{c)). D 

3 Common subtrees as pullbacks 

In this section we study the construction of common subtrees as pullbacks of embeddings 
into common supertrees, for each one of the types of tree embeddings considered in this 
paper. We start with the most general type, minor embeddings. 

Let fi'.Ti^T and /2 : T2 ^ T be henceforth two minor embeddings. Without any 
loss of generality, and unless otherwise stated, we shall assume that V(Ti),V{T2) Q ^iT) 
and that the minor embeddings /i and /2 are given by these inclusions. For simplicity, we 
shall denote thus the image of a node a G V{Ti) under the corresponding /j again by a. 

Let Tp be the graph with set of nodes V{Tp) = V{Ti) n V{T2) and set of arcs defined 
in the following way: for every a,b a ^(^i) H V{T2), (a, b) € E(Tp) if and only if there are 
paths a-^b in Ti and in T2 without intermediate nodes in V(Ti) D V{T2). We shall call this 
graph Tp the intersection of Ti and T2 obtained through /i and /2. 

This graph satisfies the following useful lemma. 

Lemma 6. For every a,b G ^(^1) ^ ^(72)-' 

(i) If there exists a path a-^b in Tp, then there exist paths a-^b inTi and in T2. 



(a) If there exists a path a~^h in some Tj, i = 1,2, then there exists also a path a-^b in 
Tp, and its intermediate nodes are exactly the intermediate nodes of the path a-^b in 
Ti that belong to V{Ti) n ^(Ta). 

Proof. Point (i) is a direct consequence of the fact that every arc in Tp corresponds to paths 
in Ti and T2. 

As far as point (ii) goes, we shall prove that if there exists a path a-^b mTi, then there 
exists also a path a --^ 6 in Tp with intermediate nodes the intermediate nodes of the path 
in Ti that belong to V(Ti) D V(T2), by induction on the number n of such intermediate 
nodes belonging to V{Ti) f] V(T2). 

If n = 0, then there exists a path a-^b in Ti that does not contain any intermediate 
node in V(Ti) D V{T2). Since /i transforms arcs into paths with no intermediate node 
belonging to Ti, this implies that there exists a path a-^6 in T that does not contain any 
node in V{Ti)nV{T2), either. Then, by Lemma 2, this path is induced by a path a~^6 in T2, 
and by the same reason this path does not contain any intermediate node in V (Ti) OV {T2) . 
So, there are paths a-^b iiaTi and T2 without intermediate nodes in V{Ti) D V(T2), and 
therefore, by definition, there exists an arc from a to 6 in Tp. 

As the induction hypothesis, assume that the claim is true for paths in Ti with n 
intermediate nodes in V{Ti) n V{T2), and assume now that the path a~^b has n + 1 such 
nodes. Let oq be the first intermediate node of this path belonging to V{Ti) D V(T2). Then, 
by the case n = 0, there is an arc in Tp from a to oq, and by the induction hypothesis there 
is a path ao~^b in Tp whose only intermediate nodes are the intermediate nodes of the path 
ao-^b in Ti that belong to V{Ti) n V{T2); by concatenating these paths in Tp we obtain 
the path a-^b we were looking for. D 

The intersection of two minors need not be a tree, as the following simple example 
shows. 

Example 2. Let T be a tree with nodes 01,02,6, c and arcs (01,02), (02, 6), (02,0), let Ti 
be its minor with nodes oi, 6, c and arcs (oi, b), (oi, c), and let T2 be its minor with nodes 
02, 6, c and arcs (02, b), (02, c). In this case Tp is the graph with nodes 6, c and no arc, and 
in particular it is not a tree. 

Now we have the following result. 

Proposition 1. Ti and T2 have always a common minor, which is either Tp together with 
its inclusions in Ti and T2, or obtained by adding a root to Tp. 

Proof. If Tp is empty, then it is a tree and its inclusions into Ti and T2 are clearly minor 
embeddings. In this case, Tp is a common minor of Ti and T2. 

So, assume in the sequel that Tp is non-empty. If it had no node without parents, then 
it would contain a circuit and this would imply, by Lemma 6.(i), the existence of circuits 
in the trees Ti and T2, which is impossible. Therefore, Tp contains nodes without parent. 
Now we must consider two cases: 



(1) Tp has only one node Tp without a parent. Then every other node a in 7], can be reached 
from rp through a path, because this graph does not contain any circuit (as we have 
seen) and hence it must contain a path from a node of in-degree to a. To check that 
this path is unique, we shall prove that no node in Tp has in-degree greater than 1. 
Indeed, assume that there are nodes a,b,c € V(Tp), with b ^ c, and arcs from b and c 
to a. This means that there are paths in Ti and in T2 from b and c to a that do not 
contain any intermediate node in V{Ti) fi V{T2). But since, say, Ti is a tree, if there 
exist paths 6~^a and c~^a in Ti, one of the nodes 6 or c must be intermediate in the 
path from the other one to a, which yields a contradiction. 

This proves that, in this case, Tp is a tree. And by definition, for every a,b £ V{Tp), 
if (a, b) € Tp, then there are paths a -^ b in Ti and in T2 without any intermediate 
node in V{Tp). Therefore, the inclusions ti : V{Tp) ^^ V(Ti) induce minor embeddings 
Li : Tp ^ Ti, for i = 1,2, and hence Tp is a common minor of Ti and T2. 

(2) Tp contains more than one node without a parent, say xi, . . . ,Xk- The same argument 
used in (1) shows in this case that every other node a € V(Tp) can be reached from one 
of these nodes Xi through a path in Tp, and that no node in Tp has in-degree greater 
than 1. 

Let now Tp be the graph obtained by adding to Tp one node r and arcs {r,Xi), for 
i = 1, . . . ,k. Then, r is the only node without a parent in Tp and every node in it is 
reached from r through a unique path. Indeed, each Xj is reached from r through the 
new arc (r, Xj), and then every other node in Tp is reached from r by the path going 
from some Xi to it in Tp preceded by the arc from r to this Xj. And these paths are 
unique, because no node in Tp has in-degree greater than 1. Therefore, Tp is a tree with 
root r. 

Now, note that there is no non-trivial path in either Ti or T2 from any node belonging 
to V(Ti) n V{T2) to any Xj: such a path, by Lemma 6, would induce a non-trivial path 
in Tp and therefore the node Xj would have a parent in Tp. This implies in particular 
that neither the root of Ti nor the root of T2 belong to V{Ti) f] V(T2): since k ^ 2, 
there are non-trivial paths from each one of these roots to some Xj. 
Consider then the injective mappings Tj : Tp ^ Ti, i = 1,2, defined by the inclusions on 
V{Tp) and sending r to the root of the corresponding Tj. It is clear that they are minor 
embeddings: on the one hand, arguing as in (1) above, we obtain that the restriction of 
each Ti to Tp sends every arc to a path in Ti without any intermediate node coming from 
Tp-, on the other hand, Ti sends every arc (r, x^) to the path in Ti going from its root to 
Xi, which, as we saw above, does not contain any intermediate node in V{Ti) n V{T2). 
Thus, Tp is a common minor of Ti and T2. D 

If we restrict ourselves from minor embeddings to topological embeddings, then only 
the first case in the last proposition can happen. 

Proposition 2. If fi : Ti ^ T and /2 : ^2 ^ T are topological embeddings, then Tp is a 
tree and the inclusions V(Tp) ^^ V{Ti) are topological embeddings Li : Tp ^ Ti, for i = 1,2, 
and therefore Tp is a common topological subtree ofTi and T2. 



Proof. Let us prove first of all that if /i and /2 are not only minor but topological embed- 
dings, then Tp does not have more than one node without a parent. Indeed, assume that 
a,b £ V{Ti) n V(T2) have no parent in Tp. Then, neither Ti nor T2 contains any non-trivial 
path from some node in V{Ti) f] V{T2) to a or b, because, by Lemma 6, such a path would 
imply a non-trivial path in Tp finishing in a or 6 and then one of these nodes would have a 
parent in Tp. In particular, there is no path connecting a and b in either Ti or T2. For every 
i = 1,2, let Xi £ V(Ti) be the least common ancestor of a and b in Ti. By Lemma 3, each 
Xi is also the least common ancestor of a and b in T. But then xi = 2:2 € ^(^i) n V(T2) 
and therefore both a and b can be reached from a node in V{Ti) D V{T2) through paths in 
Ti and in T2, which yields a contradiction. 

So, since every topological embedding is a minor embedding, from the proof of Propo- 
sition 1 we know that the fact that Tp has at most (and hence, exactly) one node without 
a parent implies that it is a tree. Let us prove now that li is a topological embedding. By 
point (1) in the proof of Proposition 1, we already know that it is a minor embedding. So, 
it remains to prove that if there are arcs from a to 6 and to c in Tp, then the paths a-^b 
and a^^c in Ti diverge. 

To prove it, note that, since, by the definition of Tp, the paths a~^6 and a~^c in Ti and 
in T2 have no intermediate node in V{Ti) D V(T2), neither b nor c appears in the path from 
a to the other one, and therefore there is no path connecting b and c. Thus, if, for every 
i = 1,2, Xi £ V{Ti) denotes the least common ancestor of b and c in Ti, then, arguing as 
before, we deduce that xi = X2 and in particular that this node belongs to V(Ti) n V{T2). 

Now, the existence of the paths a-^b and a-^^ c in Ti, implies that either xi = a or 
there exists a non-trivial path in Ti from a to xi. But the paths a -^ b and a ~^ c in Ti 
do not contain any intermediate node belonging to V{Ti) D V{T2), and therefore it must 
happen that a = xi and the paths a^^b and a-^cinTi diverge, as we wanted to prove. D 

We have similar results if /i and /2 are not only topological, but homeomorphic or 
isomorphic embeddings. 

Proposition 3. If fi : Ti ^ T and /2 : T2 ^ T are homeomorphic embeddings, then Tp is 
a tree and the inclusions V(Tp) ^^ V(Ti) are homeomorphic embeddings ^ : Tp ^ Ti, for 
i = 1,2, and therefore Tp is a common homeomorphic subtree ofTi and T2. 

Proof. We already know from Proposition 2 that Tp is a tree and that the inclusions ii : 
Tp -^ Ti and L2 : Tp ^ T2 are topological embeddings. It remains to prove that they are 
not only topological, but homeomorphic embeddings. We shall do it for ti : Tp ^ Ti. 

Let a,b € V{Tp) be such that (a, 6) G E{Tp). Then, by definition, there exists a path 
a-^b inTi without any intermediate node in V{Ti) n V{T2). Assume that this path has 
an intermediate node x with more than one child. The path a ^^ b induces, under the 
homeomorphic embedding /i : Ti ^ T, a path a-^b in T that contains x, and this node 
has also more than one child in T. Now, by Lemma 2, there is also a path a~^6 in T2. Since 
every arc in T2 becomes, under the homeomorphic embedding /2 : T2 ^ T, an elementary 
path in T, the nodes in the path a -^ b in T that do not belong to V(T2) have only one 



child. Therefore, x £ V{T2) and hence x G V(Ti) n V{T2), which contradicts the fact that 
the path a-^6 in Ti does not contain any intermediate node in V{Ti) n V{T2). This proves 
that this path is elementary, as we wanted. D 

Proposition 4. If fi : Ti ^ T and /2 : T2 ^ T are isomorphic embeddings, then Tp is a 
tree and the inclusions V{Tp) ^^ V{Ti) are isomorphic embeddings Li : Tp ^ Ti, for i = 1,2, 
and therefore, Tp is a common isomorphic subtree of Ti and T2 . 

Proof. We already know from Proposition 3 that Tp is a tree and that li : Tp ^> Ti and 
t2 '■ Tp ^ T2 are homeomorphic embeddings, i.e., that if (a, 6) € E{Tp), then there are 
elementary paths a-^b inTi and in T2 without any intermediate node in V(Ti) D V{T2). 
We want to prove that each one of these paths consists of a single arc, i.e., that a is the 
parent of b in both trees. 

Let ci be the parent of b in Ti and C2 the parent of 6 in r2: they exist because there 
is a path a -^ 6 in each tree. Then, since Ti and T2 are isomorphic subtrees of T, both ci 
and C2 are parents of b in T, and therefore ci = C2 € ^(^i) n V{T2). So, the parents in Ti 
and in T2 of b are the same and they belong to V{Ti) n V{T2). Since the paths a-^b in Ti 
and in T2 do not contain any intermediate node in V(Ti) D V(T2) and they must contain 
ci and C2, respectively, this implies that a = ci = C2, as we claimed. D 

We have finally the following result, which gives an algebraic content to the construction 
of intersections in Treejso, Treehom, and Tree^op. 

Proposition 5. Let Tree* denote any category Treej^o; Tree/io^; or Treetop- For every pair 
of Tree^:- embeddings fi'.Ti^T and f2 '■T2 ^ T, 

{Tp, ii : Tp ^ Ti, i2 : Tp —> T2) 

is a pullback of /i and /2 in Tree* . 

Proof. We know from the previous propositions that, in each case, Tp is a tree and ii : 
Tp — > Ti and L2 '■ Tp ^> T2 are Tree*-embeddings, and it is clear that fio ii = f2° 1^2- Let us 
check now the universal property of pullbacks in Tree*. 

Let S be any tree and let gi : S ^ Ti and (72 : S" — > T2 be two Tree*-embeddings such 
that fiogi = /2 052- Then, at the level of nodes, there exists a unique mapping g : V{S) -^ 
V{Ti)nV{T2) = V{Tp) such that each gi is equal to g followed by the corresponding inclusion 
Li : V(Tp) ■— > V{Ti). And since each ii -.Tp ^ Ti and each composition g^ = Liog : S ^> Ti are 
Tree*-embeddings, Lemma 5 implies that g \s a, Tree*-embedding from S" to Tp. This is the 
unique Tree* embedding that, when composed with ii and i2, yields gi and 52, respectively. 

D 

Therefore, the categories Treejso; Tree/iom; and Tree^p have all binary pullbacks. It is 
not the case with Treemm, as the following simple example shows. 



Remark 1. The minor embeddings fi : Ti ^ T and f2 ■ T2 ^ T corresponding to the 
minors described in Example 2 do not have a puhback in Treemm- Indeed, let P, together 
with (71 : P — > Ti and (72 '■ -P — > ^2, be a pullback of them in Treemm- Then, since 
fiogi = /2052 : V{P) ^ V{T), we have that gi{V{P)) C {b,c} and <72(V^(P)) Q {b,c} and 
hence, P being a tree and (71 and (72 being minor embeddings, there are only two possibilities 
for P: 

— P is empty. In this case, if we consider a tree Q with one node q and no arc, and the 
minor embeddings hi : Q ^ Ti and h2 : Q ^ T2 given by hi{q) = h2{q) = c, then 
fiohi = /2 o /i2 but there is no minor embedding h : Q ^> P (because P is empty), 
which contradicts the definition of pullback. 

— P consists of only one node, say {x}, and no arc, and gi and g2 send x to the same 
node, b or c, in Ti and in T2. But then if we consider the same tree Q as before and the 
minor embeddings hi : Q ^ Ti and h2 '■ Q ^ T2 that send q to the node different from 
gi{x) and g2ix), there is again no minor embedding h : Q ^ P such that hi = gi o h 
and /i2 = 52 ° /*) which contradicts the definition of pullback. 

Nevertheless, arguing as in the proof of Proposition 5 we obtain the following result. 

Proposition 6. If fi : Ti ^ T and f2'-T2^T are minor embeddings such that Tp is a 
tree, then {Tp, li : Tp ^ Ti, i2 '■ Tp —^ T2) is a pullback of /i and /2 in Treemm- 

Proof. We know from the proof of Proposition 1 that if Tp is a tree, then ii : Tp ^ Ti and 
i2 '-Tp ^ T2 are minor embeddings, and it is clear that fio ii = /2 ° '-2- Then, exactly the 
same argument used in Proposition 5 shows that, in this case, {Tp, ii : Tp — > Ti, ^2 : Tp ^ T2) 
satisfies the universal property of pushouts in Treemm • D 

4 Common supertrees as pushouts 

In this section we study the construction of common supertrees as pushouts of embeddings 
of largest common subtrees, for each one of the types of tree embeddings considered in this 
paper. Let Tree* be henceforth any one of the categories of trees Treejso, Tree^om, Treetop, 
or I XQ.&jYiiyi, 

Let Ti and T2 be two trees. Let T^ be a largest common Tree*-subtree of them, and 
let mi : T^ — > Ti and m2 : T^ ^ T2 be any Tree*-embeddings. Let Ti + T2 be the graph 
obtained as the disjoint sum of the trees Ti and T2: that is, 

V{Ti + T2) = V{Ti) U V{T2), E{Ti + T2) = E{Ti) U E{T2). 

Let 9 be the equivalence relation on V {Ti)UV {T2) defined, up to symmetry, by the following 
condition: 

(a, 6) € ^ if and only if a = 6 or there exists some c G V{T^) such that a = mi{c) 
and b = m2{c). 



We shall denote the equivalence class modulo 9 of an element x G V{Ti) U V{T2) by [x\. 
Let Tpo be the quotient graph of Ti + T2 by this equivalence: 

— its set of nodes V{Tpo) is the quotient set (y{Ti) U V{T2))/6, with elements the equiv- 
alence classes of the nodes of Ti or T2 ; 

— its arcs are those induced by the arcs in Ti or T2, in the sense that ([a], \b\) G E{Tpo) if 
and only if there exist a' E [a], b' G \b] and some i = 1,2 such that (a', &') G E{Ti). 

Note that every equivalence class [a] G ^(Tpo) is either a 2-elements set {r7T,i(x), 771,2(2;)}, 
with X G y(T^), or a singleton {a}, with a G V{Ti) — mi{V{T^)) for some i = 1,2. Since 
every node in Ti and T2 has in-degree at most 1, every [a] G V{Tpo) has in-degree at most 
2, and if it is 2, then [a] must be of the first kind. 

Let li : V{Ti) -^ V{Tpo), i = 1,2, denote the inclusion V{Ti) ^ V{Ti)UV{T2) followed 
by the quotient mapping V^(ri) U V{T2) -^ {V{Ti) U V{T2))/6: that is, ii{x) = [x] for every 
X G V{Ti). Note that, by construction, 

v{Tpo) = h{v{n))ue2{v{T2)) 

and 

hiVin)) n£2{V{T2)) = h{rm{V{T^))) = i2{m2{V{T^))). 

It is straightforward to check that these mappings ii are injective, satisfy that £1 o mi = 
£2 o 1712, and they define morphisms of graphs ii : Ti ^ Tpo, i = 1,2, in the sense that if 
(a, 6) G S(T,), then {l,{a),£i{b)) G E{Tpo). 

We shall call this graph Tpo, together with these injective morphisms ii : Ti ^> Tpo, 
i = 1,2, the join of Ti and T2 obtained through rrii and m2- 

Lemma 7. Let Ti and T2 be trees, let T^ be a largest common Tree^,- subtree ofTi and T2, 
let 777-1 ■ Tf^ ^> Ti and 7712 : T^ — > T2 be any T ree^- embedding s, and let Tpo be the join of Ti 
and T2 obtained through mi and 7712. 

(i) If r is the root ofT^, then mi{r) is the root of Ti or r2 is the root 0/T2. 
(ii) For every x,y G V(T^), if Tpo contains a path from [777i(x)] = [m,2(x)] to [mi{y)] = 

[777,2(2/)]; then T^ contains a path from x to y. 
(Hi) Tpo contains no circuit. 

Proof, (i) Assume that both 7771 (r) and 7772 (r) have parents, say vi and V2, respectively. 
Lemma 2 implies that Vi ^ m,i(V{T^)), for each i = 1,2: otherwise, there would be an arc 
in Tfi from the preimage of Vi to r. Then, we can enlarge T^ by adding a new node rg and 
an arc (rg, r) and we can extend 7771 and 7772 to this new tree by sending rg to vi and V2, 
respectively. In this way we obtain a tree strictly larger than T^ and Tree^^-embeddings of 
this new tree into Ti and T2, against the assumption that T^ is a largest Tree*-subtree of 
them. 

(ii) We shall prove that if Tpo contains a path [7771 (x)] -^ [7771(7/)], then T^ contains 
a path X -^ y, by induction on its number 77 of intermediate nodes in ii{mi(y {T^))) = 
i2{m2{V{T^))). 



If n = 0, that is, if no intermediate node in the path [rrii (x)] -^ [rrii (y)] conies from a 
node of T^, then all intermediate nodes come only from one of the trees Ti or T2: assume, 
to fix ideas, that they come from Ti, and that this path is 

{[mi{x)],[vi], . . . ,[vk],[mi{y)]), 

with [vi], ..., [vk] G li{V{Ti)) - li{mi{V{T^))). Since the nodes belonging to li{V{Ti)) - 
ii{rni{V (T^))) are (as equivalence classes) singletons, and an arc in Tpo involving one node 
of this set must be induced by an arc in E(Ti), we conclude that there exists a path 

{mi{x),vi, . . . ,Vk,mi{y)) 

in Ti. Since mi is a Tree*-embedding, and in particular a minor embedding, by Lemma 2 
this implies that there exists a path x-^y in T^. 

As the induction hypothesis, assume that the claim is true for paths in Tpo with n 
intermediate nodes in ^i(r?T,i(y(T^))) = ^2(W'2(^(^^))), and assume now that the path 
[r7T,i(x)] -^ [mi(y)] has n + 1 such nodes. Let [mi (a)] be the first intermediate node of this 
path belonging to ^i(?7ii(y(T^))). Then, by the case n = 0, there is a path x -^^ a in T^, 
and by the induction hypothesis there is a path a ~^ y; by concatenating them we obtain 
the path x-^y in T^ we were looking for. 

(iii) Assume that Tpo contains a circuit. If at most one node in this circuit belongs 
to ^i(mi(y(T^))), then, arguing as in the proof of (ii), we conclude that all arcs in this 
circuit are induced by arcs in the same tree Ti or T2, and they would form a circuit in 
this tree, which is impossible. Therefore, two different nodes in this circuit must belong to 
li{mi{V{T^))). This implies that there exist a:,y € V{T^), x 7^ y, such that Tpo contains 
a path [mi (a;)] -w [mi(y)] and a path [mi(y)] ^-^ [mi(x)]. By point (ii), this implies that T^ 
contains a path x-^y and a path y-^x, and hence a circuit, which is impossible. Therefore, 
Tpo cannot contain any circuit. D 

Proposition 7. Let Ti and T2 be trees, let T^ be a largest common Tree^- subtree ofTi and 
T2, let mi : T^ ^ Ti and m2 : T^ ^ T2 be any Tree^:-embeddings, and let Tpo be the join of 
Ti and T2 obtained through mi and m2- 

(i) For every v,w & V{Tpo), if {v,w) £ E{Tpo) and there is another path v ^^ w in Tpo, 
then v,w £ ^i(y(Ti)) n ^2(^(^2)); this path is unique, it is a Tree* -pat/i and it has no 
intermediate node in ii{V{Ti)) n £2(^(^2))- In particular, i/ Tree* is Treejso; then this 
situation cannot happen. 

(ii) For every v,w £ V(Tpo), if there are two different paths from v to w in Tpo without any 
common intermediate node, then one of them is the arc {v,w), and then (ii) applies. In 
particular, again, this situation cannot happen i/Tree* is Treejso- 

Proof, (i) If {v,w) £ E{Tpo), then there exist, say, a,b £ ^(^i) such that v = [a], w = [b], 
and (a, b) £ E{Ti). Assume now that there is another path from [a] to [b] in Tpo- Since Tpo 
does not contain circuits by point (iii) in the last lemma, this path cannot contain [a] or [b] 



as an intermediate node, and therefore its first intermediate node is different from [b] and 
its last intermediate node is different from [a]. In particular, [b] has in-degree 2 in Tpo- 

This implies that there exists some y € V{T^) such that b = mi{y) and there exists 
some c € ^(^2) such that (c, m2(y)) G E{T2), and that there is a non-trivial path in Tpo 
from [a] to [c]. Since a £ ^(^i) and c G y(T2), this path must contain some node belonging 
to ii{V{Ti))rii2{V{T2)). If it is not [a], then let [r7T,i(x)] be the first intermediate node in the 
path [a] -w [c] coming from T^. Since, in this case, a £ V{Ti) — mi(y(T^)), all intermediate 
nodes in the path [a] -^ [mi(x)] come also from V{Ti) — mi{V{T^)), and therefore there 
exists a path a-^rrii^x) in Ti. But, on the other hand, since there is a path [mi(x)] -^ [mi{y)] 
in Tpo (consisting of the path [rrii (re)] -^ [c] followed by the arc ([c], [mi(y)])), from Lemma 
7.(ii) we deduce that there exists a path x-^y in T^ and hence a path mi{x) ■^^mi{y) = b 
in Ti. Summarizing, if a ^ mi{y{T^)), then Ti contains both an arc from a to mi{y) and 
a non-trivial path from a to mi{y) (through mi(x)), which is impossible. 

So, a = mi{x) for some x G V{T^). Since {mi{x)^mi{y)) G E{Ti), Lemma 2 implies 
that {x,y) G E{T^), and therefore there exists a Tree*-path in T2 from m2{x) to m2{y) 
without any intermediate node in m2{V{T^)): the uniqueness of paths in trees implies that 
this path contains c as its last intermediate node before m2{y). To begin with, this already 
shows that the situation considered in this point cannot happen if Tree* is Treejso: a Treej^o- 
path is an arc, and therefore it does not contain any intermediate node. 

Thus, we assume now that Tree* is Treemi„, Tree/iom or Treetop- The Tree*-path 7722(2;) ^^ 
77x2(2/) in ^2 without any intermediate node in m2{V{T^)) and containing c as the last 
intermediate node induces a path from [7772(3^)] = [a] to [6] in Tpo containing [c] and with 
all its intermediate nodes in ^2(^(^2)) — ^2{fn2iy{T^)y). This path is a Tree*-path. If Tree* 
stands for TreSmm or Treetop, it is obvious, because in these cases Tree*-paths are simply 
paths. If Tree* is Tree/jomj then all intermediate nodes in the path 7772 (x) -^ 7712(7/) in T2 
have only one child, and since they belong to V{T2) — m2{V{T^)) and therefore they are 
not identified with any node from Ti, their equivalence classes in Tpo have also out-degree 
1, and hence the path [7^2(2;)] "^ V^2{y)] in Tpo it induces is also elementary. 

This proves that v,w £ ii{V{Ti)) n £2(^(^2)) and that, besides the arc {v,w), there 
exists a Tree*-path v-^w without any intermediate node in li{V{Ti)) n £2(^(^2)), which 
contains [c]. Assume finally that there exists a "third" path from v to w other than the arc 
and this Tree*-path. Since w has in-degree at most 2 and Tpo contains no circuit, arguing 
as in the first paragraph of this proof we deduce that this path consists of a path from v 
to [c] followed by the arc ([c],7t;). But [c] has in-degree 1 in Tpo^ as well as all intermediate 
nodes in the path from v to [c] induced by the path m2{x) -^ c in T2. Therefore, this is 
the only path in Tpo from v to [c]. This shows that there is only one path from w to w in 
Tpo other than the arc {v,w), and it is the Tree*-path without any intermediate node in 
ei{V{Ti))n£2{V{T2)) obtained above. 

(ii) Assume that there exist two different paths from v to w without any intermediate 
node in common, and let f 1 and V2 be the nodes that precede w in each one of these two 
paths; by assumption vi / V2 and {vi,w),{v2,w) G EiTpo). Then, w has in-degree 2 in 
Tpo, and this implies that there exist y G y(T^), b G y{Ti) and c G ^(^"2) such that, say. 



vi = [6], V2 = [c], w = [mi{y)] = [m2(y)], and {h,mi{y)) G E{Ti), (c,m2(y)) G E{T2). By 
Lemma 7.(i), y has a parent x in T^, and then there are Tree^-paths mi{x)~^mi{y) in Ti 
and m2(x)~^ 771-2(2/) iii ^2- This yields, up to symmetry, three possibihties: 

— If 771-1(2;) = b and 7722(2;) = c, then [6] = [c], against the assumption vi i^ V2. Besides, if 
Tree* = Treej^o, then, since Treejso-paths are arcs, it must happen that 777-1 (x) = & and 
tn^ix) = c. So, if Tree* = Treejso) the situation described in the point we are proving 
cannot happen. In the remaining two cases we understand that Tree* 7^ Treej^o. 

— If 777-1(2;) = h and 7772(2;) 7^ c, then in Tp^ we have on the one hand the arc ([6], ti;) and on 
the other hand a path [6] -^w induced by the path 7772(2;) '^m2{y) in T2: since c is the 
parent of 777-2(2/) iii T2, it is the last intermediate node in the path 7772(2:) -^7772 (2/), and 
therefore [c] is the last intermediate node in the path [b] -^w induced by 7772 (x) -^7772(2/) • 
By (i), these are the only two paths from [b] to w. 

Let us prove now that the path v -^ w containing [c] also contains [b]. Assume first 
that this path contains some node in ^2("i-2(^^)) other than w, and let [7712 (-z)] be the 
last such node before w. This means that Tpo contains a path [7772 (-z)] -^ [^1^2(2/)] and 
therefore, by Lemma 7.(ii), there is a path z~^y in T^. But then this path must contain 
the parent x of y, which implies that the path [7772 (-z)] ~^ [7772(2/)], and hence the path 
v~-~^w through [c], contains in this case [b] = [7772 (x)]. 

Assume now that the path v-^w containing [c] does not contain any node in ^2("i-2(^^)) 
other than w. Since c G V{T2)-, this would mean that this path is completely induced by 
a path in T2, that is, v = [a] for some a G V{T2) — m2(y(T^)) and there exists a path 
(a, . . . , c, 7772(2/)) in T2 with no intermediate node in 7772(V^(r^)). In this case, since there 
is a path 7772 (x)-^ 7772 (2/) in T2 and 7772 (x) is not contained in the path a-^m2{y), there 
would exist a non-trivial path 7772 (x) --^ a, which would induce a path from [b] = [7772 (x)] 
to f = [a] forming a circuit with the path v^[b]. So, this case cannot happen. 
So, the path v-^w containing [c] also contains [5]. But, by assumption, the paths v-^w 
containing [6] and [c] have no common intermediate node. Therefore, it must happen 
that V = [6], and hence one of the paths from t; to 77; is an arc, as it is claimed in the 
statement. 

— If 7771 (x) 7^ b and 7772 (x) 7^ c, then there are Tree*-paths (777i(x), . . . , 6, 777i(y)) and 
(r772(x), ... ,0,7772(2/)) in Ti and T2, respectively, without intermediate nodes coming 
from V{T^). 

In this case, we can enlarge T^ by adding a new node xq and replacing the arc (x, y) 
by two arcs (x,xo) and (xo,2/), and we can extend 7771 and 7772 to this new node by 
sending it, respectively, to b and c. It is clear that this new tree is strictly larger than 
T^. Moreover, the extensions of 7771 and 7772 are Tree*-embeddings: the new arc (x,xo) 
is transformed under them into the Tree*-paths — without intermediate nodes coming 
from V{T^) — that go from 7771 (x) to b and from 7772 (x) to c, respectively; the new arc 
(xo,2/) is transformed under them into the arcs {b,mi{y)) and (0,7772(2/)), respectively; 
and it is clear that if 7771 and 7772 were topological embeddings, then their extensions 
are still so, because the new node xq has only one child. Thus, in this way we obtain 



a new common Tree^-subtree of Ti of T2 that is strictly larger than T^, which yields a 
contradiction. 

Summarizing, if Tree* is Treei^o, then there cannot exist two different paths v -^ w, and 
if Tree* is Treehom, Treejop, or Treeiso; there can exist two different paths v -^ w without 
common intermediate nodes, but then the only case that does not yield a contradiction is 
when one of these paths is an arc. D 

Let now T^- be the graph obtained from Tpo by removing every arc that is subsumed by 
a path: that is, we remove from Tp^ each arc {v,w) for which there is another path v~^w 
in Tpo- Note in particular that V{Ta-) = V{Tpo)- We shall call this graph the Tree*-sum of 
Ti and T2 obtained through rai and ra2. 

As a direct consequence of Lemma 7.(i), we have that if Tree* = Treej^o, then Tg- = Tpo, 
because if {v,w) G E{Tpo), there does not exist any other path v --^ w, and therefore no 
arc is removed from Tpo in the construction of Tg-. In the other three categories, still by 
Lemma 7.(i) and its proof, if the arc ([a], [fo]) induced by an arc, say, (a, 6) G E{Ti) is 
removed because of the existence of a second path [a] ~^ [b], then a,b (z mi(r^), this second 
path is a Tree*-path, and all its intermediate nodes are equivalence classes of nodes in 
^(^2) — i^2{yiT^))- In particular, since the arcs {v,w) removed in the construction of Tu 
are such that v,w £ ii{mi{V(T^))) = i2{m2{V (T^))) and the Tree*-paths that make these 
arcs to be removed have no intermediate node in this set, these paths are not modified in 
the construction of Tu, and the arcs can be removed in any order. 

Proposition 8. For every two trees Ti and T2, any Tree^-sum ofTi and T2 is a common 
Tree* -supertree of them. 

Proof. Let Ti and T2 be two trees, let T^ be a largest common Tree*-subtree of them and 
let nil : r^ — > Ti and 777-2 '■ T^^ ^ T2 be any Tree*-embeddings. Let Tg- be the Tree*-sum of 
Ti and T2 obtained through 7771 and 7772, and let £i : V{Ti) -^ V{Tu) = {V{Ti) U V{T2))/e, 
i = 1,2, stand for the corresponding restrictions of the quotient mappings. 

Every arc removed from the join Tpo of Ti and T2 in the construction of Tu is subsumed 
by a path in Tpo. This implies that, for every x,y £ V{Tpo), there is a path x-^y in Tpo if 
and only if there is a path x~^y in T^-. In particular, since the only nodes in Tpo than can 
possibly have no parent are the images of the roots of Ti and T2, the same also happens in 
T 

Now, by Lemma 7.(i), if r is the root of T^ then mi{r) is the root ri of Ti or 7772(7") 
is the root r2 of T2. If 7771 (r) = ri and 7772 (r) = r2, then [ri] = [r2] is the only node in 
Tu without parent, and every node v in Tpo (as well as in Tu, as we said) can be reached 
from this node through a path: if 7; = [ai], with ai € V{Ti), through the image of the path 
ri ~^ai in Ti, and if t; = [02], with 02 G ^(^2), through the image of the path r2 ~^a2 in 
T2. If, on the contrary, say, 777i(r) = ri but 7772 (r) 7^ r2, then [r2] is the only node in Tu 
with no parent and every node in Tu can be reached from this node through a path: every 
node of the form [02], with 02 € V{T2), through the image of the path r2-^a2 in T2, and 



every node of the form [ai], with ai G V{Ti), through the path obtained by concatenating 
the image of the path r2'^m-2{r) in T2 and the image of the path ri -wai in Ti. 

Thus, To- has one, and only one, node without parent, and every other node in To- can 
be reached from it through a path. Moreover, every node in Ta^ has in-degree at most 1. 
Indeed, if a node w has in-degree 2 in Xpo^ say {vi^w)^{y2.,w) G E{Tpo), then there will 
exist some node v and paths v -^ vi and v -^ V2 with no common intermediate node. But 
then, by Lemma 7.(ii), v will be one of the nodes vi or V2, say v = vi, and then the arc 
{vi,w) G E(Tpo) is subsumed by the path vi-^w through V2, and hence it is removed in 
the construction of Tg-, leaving only the arc {v2,w). So, every node in T^ has in-degree at 
most 1, and it can be reached through a path from the only node without a parent. This 
proves that T^ is a tree. 

Now we have to prove that ii : Ti ^ T„ and I2 '■ T2 ^ Tu are Tree^-embeddings. We 
shall prove that ii is a Tree*-embedding. Recall that the mapping ii : V{Ti) -^ V{Tpo) = 
V{Tcr) is injective, and note that, by Lemma 7, if (a, 6) G E{Ti), then there is a Tree*- 
path in To from ii{a) = [a] to ii{b) = [b] that does not contain any intermediate node in 
ei{V{Ti)) n i2{V{T2)): either the arc ([a], [b]) induced by the arc in Ti or the Tree*-path 
[a] -^ [b] that made this arc to be removed. This shows that ii is a Tree^-embedding when 

I rCG^ IS WQQisQj WQBfiQfy^ Or WQQ^Yiin- 

In the case of TreCiop, it remains to prove that if (a, 6), {a,c) G E{Ti), then the paths 
[a] -^ [b] and [a] ~^ [c] are divergent. Up to symmetry, there are three possibilities to discuss: 

— If the paths [a] ~^ [b] and [a] -^ [c] are both arcs, then the injectivity of ii implies that 
they are different and therefore they define divergent paths. 

— If the path [a] -^ [b] is an arc and the path [a] ^-> [c] has intermediate nodes, and if 
they did not diverge, [b] would be the first intermediate node of the path [a] -^ [c\. But 
this is impossible, because, since the arc ([a],[c]) G EiTpo) has been removed in the 
construction of T^j, all intermediate nodes of the path [a] -^ [c] are equivalence classes of 
nodes in ^(Ts) - m2{V{T^)). 

— If both paths [a] ~^ \b] and [a] -^ [c] have intermediate nodes, then both arcs ([a], [6]), ([a], 
[c]) G E(Tpo) have been removed in the construction of T^, and therefore there are 
x,y,z G V{T^) such that {x,y),{x,z) G E{T^), mi{x) = a, mi{y) = b, mi{z) = c, and 
the intermediate nodes of the paths [a] -^ [b] and [a] -^ [c] are the equivalence classes 
of the intermediate nodes of the paths m2{x)-^m2{y) and m2{x)~^m2{z) in T2. Now, 
since m2 is a topological embedding, these paths m2{x) -^ m2{y) and m2{x) -^ m2{z) 
have no common intermediate node. Since £2 is injective, no image of an intermediate 
node of the path m2{x) ■^m2{y) is equal to the image of an intermediate node of the 
path m2{x)'^m2{z), and thus the paths [a] ^-> [6] and [a] ~^ [c] are divergent. 

Therefore, if Tree* = TreCiop, ^1 is a topological embedding. D 

Theorem 1 below extends the last proposition in the algebraic direction, by showing 
that Tree*-sums are not only common Tree*-subtrees, but pushouts. In its proof we shall 
use several times the following technical fact, which we establish first as a lemma. 



Lemma 8. Let Tree^, be Treehorn> Treejop or Treemin- Let Ti and T2 he trees, let T^ he a 
largest common Tree^,- subtree of Ti and T2, let mi : T^ — > Ti and 777-2 '■ T^ ^ T2 he any 
Tree^-emheddings, and let fi'.Ti^T and f2'-T2—>The any Tree^-emheddings such that 

/l o W-l = /2 0"T'2. 

There do not exist x G V{T^), p G V{Ti) - mi{V{T^)), and q G V{T2) - m2{V{T^)) 
such that {mi{x),p) G E{Ti), {m2{x),q) G E{T2), and fi{p) and f2{Q) o-re connected by a 
path. 

Proof. Assume that there exist x G V{T^), p G V(Ti) — mi{V{Tfj_)), and q G V{T2) — 
m2(y{Tfj_)) such that (7771(0;), p) G E(Ti), {m2{x),q) G E(T2), and there is, say, a path 
f2{q) -^ fiip)i in such a way that f2iQ) is an intermediate node in the path /i(77ii(x)) -^ 
flip). We shall look for a contradiction. 

Under these assumptions, we can enlarge T^ by adding a new node y, a new arc (x, y), 
and replacing by a new arc {y,z) every arc {x,z) such that the path 7771 (x) ^^7771 (z) in Ti 
contains p. It is clear that the graph T^ obtained in this way is a tree, strictly larger than 
T^. We can extend 7771 and 7n,2 to T^ by defining mi{y) = p and 7772(7/) = q. If we prove 
that the mappings 77ii : V{T^) —i- V{Ti) and 77^2 : V{Tfj_) —i- V{T2) defined in this way are 
Tree,,-embeddings 7771 : T^ ^ Ti and 77^2 : T^ -^ T2, this will contradict the assumption 
that T^ is a largest common Tree*-subtree of Ti and T2. 

Now, on the one hand, the arc (x, y) is transformed under 7771 and 7712 into the arcs 
(7771 (x),p) and (77ti(x),g), respectively. Assume now that T^ contains a new arc {y,z). This 
means that T^ contained (x, z) and that p is the first intermediate node of the Tree^<-path 

7771 (x)-^ 7771 (z), which does not have any intermediate node in 777i(F(r^)). This implies that 
there exists in Ti a Tree*-path without intermediate nodes in 77^1 (y(T^)) from p = mi{y) to 
77^1(2;). As far as 7712 goes, note that the arc (x, z) in T^ induces under /i o 7771 a Tree^i-path 
from /i(77ii(x)) = /2 ("1-2(3^)) to fi{mi[z)) = f2{'ni2[z)) that contains fi{p). This path also 
contains f2{<l)-, because this node is contained in the path from /i(77ii(x)) = f2{fn2{x)) to 
flip). So, there exists a Tree*-path f2iQ) '^ f2i^TT'2iz)) , which entails the existence of a Tree*- 
path q'^m2iz) in T2. And since this path is actually a piece of the path 7772 (x)~^ 7772 (z), it 
has no intermediate node in 77^2 (T^(T'^)). 

This shows that 7771 : T^ ^ Ti and 7772 : T^ ^> T2 transform arcs into Tree^-paths 
without any intermediate node coming from T^, and hence that they are Tree^-morphisms 
when Tree* is Tree/jom or Treemm- When Tree* = Treefop, it remains to check that 7771 and 

7772 transform pairs of arcs with the same source node into divergent paths. To do it, note 
first that in this case x has at most one child z such that the path 7771 (x) ^-> miiz) in Ti 
contains p, because the paths in Ti from 77ii(x) to the images under 7771 of the children 
of X diverge. Therefore, the new node y has out-degree at most 1 in T^. So, to prove that 
7771 and 7772 are topological embeddings, it is enough to check that if yi is any child of x 
in T^ other than y, the paths 777j(x) -^ niiiy) and niiix) -^niiiyi) in each Tj diverge. For 
7 = 1 it is obvious, because the path 7n,i(x) -^miiy) is simply the arc (7771 (x),p) and, by 
assumption, p is not contained in the path 77ii(x) ~^777i(yi). As far as the case 7 = 2 goes, 
the path 7772 (x)-^ 7772(7/) is simply the arc (7772 (x),g), and thus it is enough to check that q 



is not contained in the path m2{x) ~~^m2{yi). But the paths from fi{mi{x)) = f2i'm2{x)) 
to flip) and to /i(m.i(yi)) = /2 ("1-2(2/1)) diverge because /i is a topological embedding, 
and therefore, since /2('7) is contained in the fist one, it cannot be contained in the second 
one, which implies that q cannot be contained in the path m2{x) ~^ra2{yi). This finishes 
the proof that, when Tree^, = Treeiop, rai and ra2 are topological embeddings. D 

Theorem 1. Let Ti and T2 be trees, let Tfj_ be a largest common Tree^-subtree of Ti and 
T2, and let mi : T^ ^ Ti and 771-2 '■ T^ ^ T2 be any Tree^- embeddings. 

Then, the Tree^-sum Tfj of Ti and T2 obtained through mi and m2, together with the 
Tree.^-embeddings ii : Ti ^ T^- and I2 '■T2 ^T^j, is a pushout in Tree* of mi and 1712- 

Proof. It is clear that ii o mi = £2 ° "1-2 • Therefore, it remains to prove that T„, together 
with the Tree*-embeddings ii : Ti ^ T„ and £2 : T2 ^ T^, satisfies the universal property 
of pushouts in Tree*. 

So, let fi'.Ti^T and /2 : T2 ^ T be any Tree*-embeddings such that fiorrii = f2°m,2. 
It is well-known that there exists one, and only one, mapping / : (y{Ti)\JV{T2))/0 -^ V{T) 
such that f oil = /i and f oi2 = f2'- namely, the one defined by /([a]) = /i(a) if a G ^(^i) 
and /([a]) = /2(a) if a G V(T2). We must prove that this mapping / is a Tree*-embedding. 

Let us prove first that it is injective. Assume that there exist v,w G V(T), v ^ w, such 
that f{v) = f{w). Since /i and /2 are injective, it is clear that they cannot be classes of nodes 
of the same tree Tj. Thus, there exist a G V{Ti) — mi(y{T^)) and b G V(T2) — m2{V(T^)) 
such that V = [a] and w = [b] and /i(a) = f2{b). 

By Lemma 7.(i), the image under some nii of the root of T^ is the root of the corre- 
sponding Tj. This implies that there exists a path from the image of a node in T^ to one of 
these nodes a or 6 in the corresponding tree. Moreover, if there exists, say, some x G V{Tfi) 
such that there is a path mi{x)~-^a in Ti, then there is a path from /i(r7T.i(x)) = f2{'ni2{x)) 
to /i(a) = f2{b) in T, and hence a path 771,2 (x) -^6 in ^2- By symmetry, if there exists some 
X G V{T^) such that there is a path 7712 (x)~^ 6 in T2, then there is a path 77ii(a:)~^a in Ti. 

This shows that there exists a node xq G V{Ty) such that there exist paths rr7.i(xo) -^a 
in Ti and r7T,2(xo) -^ 6 in r2 without any intermediate node in mi{y{T^)) or 771,2(^(7)^)), 
respectively. These paths induce, through /i and /2, the same path from /i(r7T.i(xo)) = 
/2(w.2(xo)) to /i(a) = f2{b) in T (because of the uniqueness of paths in trees). Let now 
e be the child of mi(xo) contained in the path 7711 (xq) -^ a in Ti, and d the child of 
77T,2(xo) in the path m,2{xQ)-^b in T2. Then /i(e) and f2{d) are contained in the path from 
/i(mi(xo)) = f2{'m2{xQ)) to /i(a) = f2{b) in T, and hence, they are connected by a path. 

When Tree* is Tree/iom; Treejop or Treemj„) Lemma 8 says that this situation is im- 
possible, and therefore / must be injective. In the case when Tree* = Treej^o) since /i 
and /2 transform arcs into arcs, it must happen that /i(e) = f2{d). This allows us to en- 
large T^, by adding a new node yo and a new arc (xo,2/o): it is clear that the graph T^ 
obtained in this way is a tree. We extend 777,1 and 777,2 to T^ by defining 777i(yo) = e and 
"1-2(^0) = d. The mappings 777i : V{T^) -^ V{Ti) and 7712 : V{T^) -^ ^(^2) defined in 
this way are isomorphic embeddings 777,1 '■ T^ ^f Ti and 777,2 : T^ -^ T2. Indeed, they are 



injective because their restrictions to T^ are injective and, by assumption, e ^ mi{V (T^)) 
and d ^ m2{V(Tfj^)), and they transform arcs into arcs because their restrictions to T^ do so 
and {mi{xo) , rriid/Q)) ^ ^C^i) for each i = 1, 2. In this way we obtain a common isomorphic 
subtree of Ti and T2 that is strictly larger than T^, which yields a contradiction. Therefore, 
/ is also injective in this case. 

So, / : V{To-) —>■ V{T) is always injective. Now, assume {v,w) € T^. Then, for some 
i = 1, 2, there exist a,b & ^(^i) such that v = [a], w = [b], and (a, 6) G E{Ti): to fix ideas, 
assume that i = 1. This implies that there is a Tree*-path from /(f) = /i(a) to /(it;) = /i(^) 
in T. If Tree* = Treej^o) this already proves that / is an isomorphic embedding. 

Thus, henceforth, we shall assume that Tree* 7^ Treej^o- In this case, we must check that 
no intermediate node of this Tree*-path f{v) -^ f{w) belongs to f(y{Ta-)) = fi{V{Ti)) U 
/2(V^(T2)). Now, /i being a Tree*-embedding, we already know that no intermediate node 
of this path belongs to fi{V{Ti)), and therefore we only have to check that no intermediate 
node belongs to f2iV(T2)), either. Before proceeding, note that we have already proved 
that / sends arcs to Tree*-paths, and hence that this mapping transforms paths in Tu into 
paths in T. 

Assume that there is some c € ^(^2) such that /2(c) is an intermediate node of the path 
fi{a)~^fi{b) in T. This prevents the existence of paths [c] -^ [a] or [b] ~^ [c] in Tg-: the image 
of such a path under / would be a path in T that would build up a circuit with the path 
from /i(a) = /([a]) to /2(c) = /([c]) or from /2(c) = /([c]) to /i(6) = /([&]), respectively, 
that we already know to exist. Moreover, c ^ m2{V{T^)), because if c G m2{V{T^)), then 
/2(c) G f2{m2{V{T^))) = Mmi{V{T,))) C MV{T,)). 

After excluding these possibilities, we still must discuss several cases: 

— a = mi{x) and b = mi{y) for some x, y G V(T^). In this case, by Lemma 2, the existence 
of an arc from i2{m2{x)) = ii{mi{x)) = [a] to ^2{'ni2{v)) = ^li'miiv)) = \b] implies the 
existence of an arc from 777-2 (x) to 7772(2/) in ^2- Since /2 is a Tree*-embedding, the path 
from /2 (7772(3;)) = /i(a) to /2(?^2(y)) = /i(&) does not contain any intermediate node 
in f2iy{T2))-, which contradicts the existence of c. 

— a = 777i(x) for some x G V{Ty), but b ^ 777i(y(r^)). In this case, since /2 is a Tree*- 
embedding, the existence of a Tree*-path /2(7?72(x)) = /i(a) -^ /2(c) in T implies, by 
Corollary 2, the existence of a Tree*-path 7n,2(x) --^ c in T2. And this path cannot have any 
intermediate node in m2{y{T^)): any intermediate node in this set would become, under 
/2, an intermediate node in f2{'m2{y {Tp))) = fi{mi{y{Ty))) C fi{y{Ti)) of the path 
72(^^2 (x)) -^ /2(c). Let d be the child of 7n,2(x) contained in this path 7772 (x) -^ c. The 
path /2(7772(x)) = /i(a)-^/2(c) contains /2(d), and therefore f2{d) is an intermediate 
node of the path /i (a) -^ /i (6) . But then this situation is impossible by Lemma 8. 

— a ^ 7771 (F(T^)). Since, by Lemma 7.(i), the image under 77^1 or 7772 of the root of T^ 
is the root of Ti or T2, respectively, we know that there exists some x G V{T^) such 
that there is a path 7771 (x) ---!■ a in Ti or a path 7772 (x) -w c in T2. It turns out that the 
existence of such a path 7771 (x) --^ a in Ti or 77i2(x) ~^ c in T2 implies the existence of 
paths 7771 (x) ~~^ 0, and m2{x) -w c in Ti and T2, respectively. Indeed, if there exists a 



path mi(x)^^a, then there is a path /i(m-i(a;))~^/i(a) in T, which, composed with the 
path /i(a) -^ /2(c), yields a path f2{rn2{x)) = /i(mi(x)) ~^/2(c), and this path, on its 
turn, imphes a path ?7i2(x)~^c in T2. Conversely, if there exists a path 771,2 (x)~^c, then 
there is a path from f2{'m2{x)) to /2(c) in T. Since there is also a path /i(a) '^/2(c) 
and f2{fn2{x)) = fi{fni{x)) cannot be an intermediate node of the path /i(a) -^/2(c) 
(because this path does not contain any intermediate node in fi{V{Ti))), it must happen 
that /i(a) is intermediate in the path f2{'m2{x)) ~^/2(c), that is, that there is a path 
fi{mi{x)) = /2("i2(x))-^/i(a) which, finally, implies a path mi{x)-^a in Ti. 
So, we can take x G V{Tfj) such that, on the one hand, there exist paths mi{x)--^a and 
771,2(3;) ~^c in Ti and T2 and, on the other hand, there do not exist paths mi{y)-^a in Ti 
or 7772(2/) "^c in ^2 for any child y of it. Let then e be the child of 777i(x) contained in the 
path 7771 (x)^^ a in ^i, and d the child of 7772 (x) contained in the path 7772(2;) -^c in T2. 
The uniqueness of paths in T implies that the path f2{'m2{x)) -^ /2(c), which contains 
/2(d), is the concatenation of the path /i(7n,i(x))~^/i(a), which contains /i(e), and the 
path /i(a) -^ f2ic). Therefore, /i(e) and f2id) are connected by a path. By Lemma 8, 
this situation cannot happen. 

Therefore, / transforms arcs into Tree*-paths without intermediate nodes in f(y(T„)), 
and thus it is a Tree<,-embedding when Tree=f is Treehom or Treemm- This proves the universal 
property of pushouts, and with it the statement, for these categories. It remains to prove 
it in Treetop- 

So far, we know that, if we are in Treejop, then / transforms arcs into paths without 
intermediate nodes in f{V(T„)). Now we must prove that it transforms arcs with the same 
source node into divergent paths. So, assume there are arcs (v, w) and (v, u) in T^- with 
w ^ u. 

If these arcs are induced by arcs in the same tree, i.e., if there exist (a, 6),(o, c) G V{Ti)^ 
for some i = 1,2, such that v = [a], w = \b] and n = [c], then, since /« is a topological 
embedding, the paths from f{v) = fi{a) to f{w) = fi{b) and to /(n) = /i(c) are divergent. 
Now consider the case when each one of these arcs is induced by an arc in a different tree. 
In this case, there exist x G V(T^), b G V(Ti) and c G V(T2) such that, say, v = [7771 (x)] = 
[7772(x)], w = [b] and u = [c], and there are arcs (7771 (x), 6) G E{Ti) and (7772(x),c) G E{T2). 

If there exists y G V{T^) such that 7771(7/) = 6, then, by Lemma 2, (x,7/) G E(T^) and 
hence there exists a path 7771 (x) ~^ 7772(7/) in T2. But since there is an arc from [7772 (x)] = 
[7771 (x)] to [7772(7/)] = [b] in To-, the path 7772 (x) -^7772(7/) in T2 must also be an arc (otherwise, 
it would induce a path in Tpo that would have made the arc {v, w) to be removed in the 
construction of T^). Therefore, the arc {v, w) is induced by the arc (7772 (x), 7772(7/)) in T2, and 
thus both arcs {v,w) and {v,u) are induced by arcs in T2 and the paths f{v)'~^f{w) and 
fiv)-^ f{u) are divergent, as we have just seen. In a similar way, if there exists y G V{Ty) 
such that 7772(7/) = c, then both arcs {v^w) and {v^u) are induced by arcs in Ti and the 
paths f{v)'^f{w) and f{v)~^f{u) are divergent. 

Consider finally the case when neither b nor c have a preimage in T^. There are two 
possibilities to discuss: 



— If there exists an arc (x, z) G V{T^) such that h is the first intermediate node of the 
path mi{x) ~^?7ii(z), then w = \b] is the first intermediate node of the path [mi (a;)] ~^ 
[mi(2;)]. In particular, u = [c] does not appear in this last path, which implies that 
the arc (m2(x),c) and the path ra2{x)~^m2{z) are divergent. Since /2 is a topological 
embedding, the paths in T from /2(w,2(x)) to /2(c) and from /2(w-2(a^)) = /i("i-i(a^)) to 
f2{rn2{z)) = fi{mi{z)) are also divergent. Since /i(6) is contained in this last path, we 
finally deduce that the paths from f{v) = /2(W'2(a^)) = fiifniix)) to /(n) = fi{b) and 
to /(w) = /2(c) are divergent. 

The case when there exists an arc {x,z) € V{Tf^) such that c is the first intermediate 
node of the path in2{x) -^ in2{z) is solved in a similar way. 

— If there is no arc (x, z) in T^ such that 6 or c are intermediate nodes of the paths 
mi{x) "^ mi{z) or m2{x)~^m2{z), respectively, then we can enlarge T^ by adding to it 
a new node yo and an arc (a;,|/o), and we can extend rrii and 771-2 to this new tree by 
defining mi(yo) = b and m2{yo) = c, and it is straightforward to prove that in this way 
we obtain a topological subtree of Ti and T2 strictly larger than T^, which contradicts 
the assumption that T^ is a largest common topological subtree of Ti and T2. So, this 
possibility cannot happen. 

This finishes the proof for Tree^p. D 

Remark 2. To frame the last result, it is interesting to note that no category Tree* con- 
sidered in this paper has all binary pushouts, essentially because the category of sets with 
injective mappings as morphisms does not have all binary pushouts, either. As a matter 
of fact, the simplest counter-example does not involve arcs at all. Let S be the empty tree 
and, for every i = 1,2, let Tj be the tree consisting of a single node {oj} and no arc, and let 
rrii : V{S) -^ ViTi) be the empty mapping. It is clear that each rrii is a Tree^^-embedding, 
for every category Tree*. Now, assume that mi : S ^ Ti and 777-2 : 5" — > T2 have a pushout 
(P, gi-.Ti^ P, 92 ■■T2^ P) in Tree*. 

Consider the tree P' consisting of two nodes ai , 02 and no arc and the mappings g'l : 
V{Ti) -^ V{P') and g'^ : V{T2) -^ V{P') defined by g[{ai) = d and 5^(02) = 02. These 
mappings are Tree*-embeddings, for every category Tree*. Since g'l o 7711 = g'2 o 777-2, by 
the universal property of pushouts there exists a Tree*-embedding g' : P ^> P' such that 
g' °gi= g'l and g' og2= g'2- in particular, 5' (511(01)) = oi / 02 = g' {92(0.2)), and therefore 
9\{ax) / 52(02). 

Consider now the tree P" consisting of a single node a and no arc and the mappings 
g'{ : V{Ti) -^ V{P") and g'^ : V{T2) -^ V{P") defined by g'{{ai) = ^^'(02) = a. Again, 
these mappings are Tree*-embeddings, for every category Tree*, and they satisfy that g'{ o 
7771 = g'2 o"i2- Then, by the universal property of pushouts, there exists a Tree*-embedding 
g" : P ^ P" such that g" o g^ = g'{ and g" o 52 = ffs- But then g"{gi{ai)) = g'{{ai) = 
a = (72(^2) = 5" (52(02)), and hence g" is not injective. Therefore, it cannot be a Tree*- 
embedding, which yields a contradiction. 

This shows that 7771 and 7772 don't have a pushout in any category Tree*. Of course, in 
this case S is not a least common Tree*-subtree of Ti and T2. 



5 Largest common subtrees and smallest common supertrees 

Let Tree* still denote any category Treejso; Tree/iom; Treetop or Treemm- In this section, we 
show that the constructions presented in the last two sections can be used to obtain largest 
common Tree*-subtrees and smallest common Tree*-supertrees of pairs of trees. The key 
will be the following result. 

Lemma 9. Let Ti and T2 be two trees, and let T^ be a largest common Tree^^- subtree of 
them. For every common Tree^- supertree T of Ti and T2, we have that \V{T)\ ^ [y(Ti)| + 
\V{T2)\-\V{T^)\. 

Proof. Propositions 1, 2, 3, and 4 show that, for every two Tree*-embeddings /i : Ti ^ T 
and /2 : ^2 — > T, there exists a common Tree*-subtree Tq of Ti and T2 with set of nodes 
containing /i(y(ri))n/2(V^(T2)): after a relabeling of the nodes (so that /i and /2 are given 
by inclusions of the sets of nodes), it will be the intersection Tp of Ti and T2 in Treej^o) 
Tree/iom, and Treetop, and its one- node extension Tp in Treemm- Then, 

|/i(y(ri))n/2(y(r2))| ^ |y(ro)| ^ \v{t,)\ 

and hence, 

|y(r)|^|/i(y(Ti))u/2(nT2))| 

= |/i(y(Ti))| + |/2(F(r2))| - |/i(y(Ti)) n /2(y(T2))| 

^\V{T^)\ + \V{T2)\-\V{T^)\ 
^\V{T^)\ + \V{T2)\-\V{T^)l 

as we claimed. D 

Theorem 2. For every pair of trees Ti and T2, any Tree* -sttm of Ti and T2 is a smallest 
common Tree* -supertree of them. 

Proof. By Proposition 8, any Tree*-sum T^ of Ti and T2 is a common Tree*-supertree of 
them, and by construction 

\V{T^)\ = \V{T,)\ + \V{T2)\-\V{T,)\, 

for some largest common Tree*-subtree T^ of them. Thus, T(j achieves the lower bound 
established in Lemma 9 for common Tree*-supertrees of Ti and T2, which implies that it is 
a smallest common Tree*-supertree of them. D 

Theorem 3. For every two trees Ti andT2, any intersection ofTi andT2 obtained through 
Treet:-embeddings into a smallest common Tree^- supertree of them is a largest common 
Tree* -subtree of Ti and T2 . 



Proof. Let Ti and T2 be two trees, let T^ be a smallest common Tree^,-supertree of Ti and 
T2, let pi : Ti — > T^ and P2 '■ T2 ^ T'^ be any Tree*-embeddings, and let T^ be any common 
Tree*-subtree of Ti and T2 obtained by expanding the intersection Tp of Ti and T2 obtained 
through pi and ^2, which exists by Propositions 1, 2, 3, and 4. 

Now, by Theorem 2 we have that, for any largest common Tree^-subtree T^ of Ti and 
7^ 

\V{T:^)\ = \V{Tr)\ + \V{T2)\-\V{T,)\ 



and we know that 
Then, 



\pi{V{Tr)) r^P2{V{T2))\ ^ \V{T')\ ^ \V{T^)\. 



\V{Tr)\ + \V{T2)\-\V{T^)\ 

= \V{n)\^\p^{V{T^))\Jp2{V{T2))\ 

= \pi{V{T^))\ + \P2{V{T2))\ - \pi{V{T^)) np2{V{T2))\ 

^\V{Tr)\ + \V{T2)\-\V{T;;)\ 

■;>\V{T^)\ + \V{T2)\-\V{T^)\. 

This implies that [y(T')| = |y(T^)[ = \pi{V{Ti)) r\ p2iy {T2))\. From these equalities we 
deduce, on the one hand, that T^ is also a largest common Tree,,-subtree of Ti and T2, and 
on the other hand, that V(T'p) = pi{y{Ti)) f\ p2iy {T2)) ., i.e., that T^ = Tp., as we claimed. 

D 

Thus, for every pair of trees Ti and T2, the pushout in Tree* of any Tree*-embeddings 
from a largest common Tree*-subtree of them yields a smallest common Tree^^-supertree of 
them, and the pullback in Tree* of any Tree*-embeddings into a smallest common Tree*- 
supertree of them yields a largest common Tree*-subtree of them. Moreover, all smallest 
common Tree*-supertrees and all largest common Tree*-subtrees are obtained in this way 
up to isomorphisms, as the following corollaries show. 

Corollary 3. Every smallest common Tree* -supertree of a pair of trees Ti and T2 is, up to 
an isomorphism, the Tree* -stim of Ti and T2 obtained through the embeddings of a largest 
common Tree^- subtree into them. 

Proof. Let Ti and T2 be two trees, let T'^ be a smallest common Tree*-supertree of Ti and 
T2 and let pi : Ti ^ T^ and P2 '■ T2 ^ T^ be any Tree*-embeddings. By Theorem 3, the 
intersection Tp of Ti and T2 obtained through pi and p2, together with the corresponding 
inclusions ti : Tp — > Ti and i2 : Tp ^ T2, is a largest common Tree*-subtree of Ti and T2. 
Let now To-, together with rui : Ti — > T^- and ?n,2 : T2 ^ T^-, be the sum of Ti and T2 
obtained through ii and L2. By Theorem 2, T„ is a smallest common Tree*-supertree of Ti 
and T2, and by Theorem 1, (To-, mi : Ti -^ T^-, m2 : T2 -^ T^j) is a pushout of ti : Tp — > Ti 
and i2 : Tp —> T2 in Tree*. Since pi o ii = P2 ° l.2, by the universal property of pushouts 
there exists a Tree*-embedding p : T^^ ^ T^ such that p o rrii = pi and p o 1712 = P2- Now, 
To- and T^ have the same size, because they are both smallest common Tree*-supertrees of 
Ti and T2. Therefore, p : To ^ T^ is bijective, and thus an isomorphism by Lemma 4. D 



A similar argument, which we leave to the reader, proves also the following result. 

Corollary 4. Every largest common Tree^,- supertree of a pair of trees Ti and T2 is, up to 
an isomorphism, the intersection of Ti and T2 obtained through their embeddings into a 
smallest common Tree^- supertree. 

Corollary 5. The problems of finding a largest common Tree^t,- subtree and a smallest com- 
mon Tree* -sttperiree of two trees, in each case together with a pair of witness Tree^- embed- 
dings, are reducible to each other in time linear in the size of the trees. 

Proof. Given two trees Ti and T2, if we know a largest common Tree^-subtree T^ of them, 
together with a pair of witness Tree* -embeddings mi : T^ — > Ti and 771-2 '■ T^ ^ T2, then 
the construction of the pushout 

{T^,h:Ti^T^,i2:T2^T^) 

of rui and m2 described in Theorem 1 gives a smallest common Tree*-supertree of Ti and 
T2, and this construction can be obtained in time linear in the size of Ti and T2, as follows. 

First, make copies T{ and Tg of Ti and T2, with li : Ti ^ T[ and £2 '■ T2 ^ T2 identity 
mappings. Second, sum up T[ and Tg into a graph T^. Third, for each a € V(T^), merge 
nodes ii{mi{a)) and ^2("T'2(o)), and remove all parallel arcs. 

Next, remove from T^- all arcs subsumed by paths, as follows. For each node y £ V{T(j) 
of in-degree 2, let x,x' € V{T(j) be the source nodes of the two arcs coming into y. Now, 
perform a simultaneous traversal of the paths of arcs coming into x and x', until reaching 
node x' along the first path or x along the second path. The simultaneous traversal of 
incoming paths may stop along either path, but continue along the other one, because a 
node of in-degree or in-degree 2 is reached. Finally, remove from T^ either arc (x',y), 
if node x' was reached along the first path, or arc {x,y), if node x was reached along the 
second path. 

Conversely, if we know a smallest common Tree*-supertree T of Ti and T2, together 
with a pair of witness Tree*-embeddings fi'.Ti^T and /2 : ^2 ^ T, then, by Theorem 3, 
the pullback 

(Tp, ii : Tp ^ Ti, i2 : Tp —> T2) 

of /i and /2 described in Section 3 yields a largest common Tree*-subtree of Ti and T2, and 
this construction can also be obtained in time linear in the size of Ti and T2, as follows. 

First, make a copy Tp of T, with g : T ^> Tp the identity mapping. Second, for each 
a G V{Ti), mark g{fi{a)) in Tp. Third, for each a G ^(^2), if g{f2{a)) is already marked 
in Tp, double-mark it. Next, for each node of Tp which is not double-marked, add a new 
arc from its parent (if any) to each of its children (if any) in Tp, and remove the node not 
double-marked. Finally, set mappings Li : Tp ^> Ti for i = 1,2, as follows: for each a G V{Ti), 
ii g{fi{a)) is defined, set ii{g{fi{a))) = a. D 



6 Conclusion 

Subtree isomorphism and the related problems of largest common subtree and smallest 
common supertree belong to the most widely used techniques for comparing tree-structured 
data, with practical applications in combinatorial pattern matching, pattern recognition, 
chemical structure search, computational molecular biology, and other areas of engineering 
and life sciences. Four different embedding relations are of interest in these application 
areas: isomorphic, homeomorphic, topological, and minor embeddings. 

The complexity of the largest common subtree problem and the smallest common su- 
pertree problem under these embedding relations is already settled: they are polynomial- 
time solvable for isomorphic, homeomorphic, and topological embeddings, and they are 
NP-complete for minor embeddings. Moreover, efficient algorithms are known for largest 
common subtree under isomorphic, homeomorphic, and topological embeddings, and for 
smallest common supertree under isomorphic and topological embeddings, and an expo- 
nential algorithm is known for largest common subtree under minor embeddings. 

In this paper, we have established the relationship between the largest common subtree 
and the smallest common supertree of two trees by means of simple constructions, which 
allow one to obtain the largest common subtree from the smallest common supertree, and 
vice versa. We have given these constructions for isomorphic, homeomorphic, topological, 
and minor embeddings, and have shown their implementation in time linear in the size of 
the trees. In doing so, we have filled the gap by providing a simple extension of previous 
largest common subtree algorithms for solving the smallest common supertree problem, in 
particular under homeomorphic and minor embeddings for which no algorithm has been 
known previously. 

Beside the practical interest of these extensions to previous algorithms, we have pro- 
vided a unified algebraic construction showing the relation between largest common sub- 
trees and smallest common supertrees for the four different embedding problems studied 
in the literature: isomorphic, homeomorphic, topological, and minor embeddings. The uni- 
fied construction shows that smallest common supertrees are pushouts and largest common 
subtrees are pullbacks. 
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